This is the README file for replicating the analysis in Advani Elming Shaw "The Dynamic Effects of Tax Audits". The code for reproducing the analysis is written as "do files" which can be run in Stata (version 12.1 or higher).

The data required to run the code can be obtained by approved researchers via the HMRC Datalab. Data may only be accessed within the HMRC Datalab secure research facility. To apply for access to HMRC administrative data, see https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/801989/HMRC_DatalabProjectProposalApplication.odt

The relevant datasets are:
1. "Valid Views" 1999-2012: this dataset contains income and demographic information for individuals who filed income tax self assessment. For a given year the data are split across two files: "valid view" and "invalid view" (based on availability of a particular geographic identifier). These two files are always appended, and together cover the population of self assessment filers.
2. "SA302" 1999-2012: this dataset contains summary information on income and information on tax amounts owed for individual taxpayers. 
3. "CQI" 1999-2009: this dataset ("compliance quality initiative") contains information on income tax self assessment enquiries undertaken by HMRC. Enquiries is the HMRC term for audits.
Full variable lists and documentation for datasets 1-3 are available within the HMRC datalab facility. The code provided includes variable labels for variables that are used in the analysis.
4. Additionally we used an inflation rate series (the consumer price index), which was originally obtained from the Office for National Statistics (ONS), and would need to be imported to the datalab by the user. This can be found and freely downloaded from the ONS website. 

Since the data are not permitted to be provided externally to the datalab facility, replication of the research can be done by requesting a new project within the HMRC datalab, requesting the code files in this archive be imported into the lab, and running the code within the secure facility.

The code files included are as follows:
1. master.do: this is the master do file, and (subject to point 2 below) simply running this file will set up the analysis data set based on the raw underlying data, and then produce all the tables and datasheets that underlie all of the charts. (Note: because the data have to be cleared from the datalab in excel format, charts were later made in excel external to the lab environment, based on exported datasheets, rather than produced in Stata). The comments in master file provide an explicit mapping between the code and the results in the paper.
2. paths.do: before running the master file, this do file needs updating to provide the appropriate paths to folders for an individual's particular project.
3. CQI_cleanEnquiriesData.do: cleans the CQI data.
4. VV_cleanData.do: cleans the Valid Views data.
5. SA302_cleanData.do: cleans the SA302 data.
6. CV_buildInitialDescriptives.do: produces some descriptives based only on the CQI and Valid Views datasets.
7. CVS_constructCouldHaveAnalysisDataset.do: constructs the main analysis dataset, based on an unconditional random sampling procedure for controls.
8. CVS_constructMatchedCouldHaveDataset.do: constructs an alternative analysis dataset, based on a stratified sampling procedure for controls.
9. CVS_buildInitialDescriptives.do: produces some descriptives based on the analysis dataset.
10. CVS_couldHaveBalancingTest.do: produces tables showing sample balance.
11. CVS_couldHaveRegressions.do: produces the main results on dynamic effects using main sample. 
12. CVS_reweightedDynamics.do: produces the analogous results on dynamic effects using a reweighted regression to adjust for differences in observables.
13. CVS_forwardLookingRegressions.do: produces the main results for effects by audit outcome.
14. AppendixC.do: produces the robustness checks covered in Appendix C.

Note that in many cases the do file provides for a program. In this case after running the do file to load the program, the program must actually be called to have any effect. If the master file is run, it includes calls to these programs. 
